Faster index for property matching
نویسندگان
چکیده
In this paper, we revisit the Property Matching problem studied by Amir et al. [Property Matching and Weighted Matching, CPM 2006] and present a better indexing scheme for the problem. In particular, the data structure by Amir et al., namely PST, requires O(n log |Σ|+ n log log n) construction time and O(m log |Σ|+ K) query time, where n and m are the length of, respectively, the text and the pattern, Σ is the alphabet and K is the output size. On the other hand, the construction time of our data structure, namely IDS PIP, is dominated by suffix tree construction time and hence is O(n) time for alphabets that are natural numbers from 1 to a polynomial in n and O(n log σ) time otherwise, where σ = min(n, |Σ|). The query time is same as that of PST. Also, IDS PIP has the advantage that it can be built on either a suffix tree or a suffix array and additionally, it retains the capability of answering normal pattern matching queries.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملFast Least Square Matching
Least square matching (LSM) is one of the most accurate image matching methods in photogrammetry and remote sensing. The main disadvantage of the LSM is its high computational complexity due to large size of observation equations. To address this problem, in this paper a novel method, called fast least square matching (FLSM) is being presented. The main idea of the proposed FLSM is decreasing t...
متن کاملFAMOUS: Fast Approximate string Matching using OptimUm search Schemes
Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. The introduction of practical bidirectional indices has opened new possibilities for solving the problem as they allow the search to be started from anywhere within the pattern and extended in both directions. In particular, use of search sch...
متن کاملFaster Filters for Approximate String Matching
We introduce a new filtering method for approximate string matching called the suffix filter. It has some similarity with well-known filtration algorithms, which we call factor filters, and which are among the best practical algorithms for approximate string matching using a text index. Suffix filters are stronger, i.e., produce fewer false matches than factor filters. We demonstrate experiment...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 105 شماره
صفحات -
تاریخ انتشار 2008